Quantifying copy number variations using a hidden Markov model with inhomogeneous emission distributions.

نویسندگان

  • Kenneth Jordan McCallum
  • Ji-Ping Wang
چکیده

Copy number variations (CNVs) are a significant source of genetic variation and have been found frequently associated with diseases such as cancers and autism. High-throughput sequencing data are increasingly being used to detect and quantify CNVs; however, the distributional properties of the data are not fully understood. A hidden Markov model (HMM) is proposed using inhomogeneous emission distributions based on negative binomial regression to account for the sequencing biases. The model is tested on the whole genome sequencing data and simulated data sets. An algorithm for CNV detection is implemented in the R package CNVfinder. The model based on negative binomial regression is shown to provide a good fit to the data and provides competitive performance compared with methods based on normalization of read counts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hidden Markov Model-Based CNV Detection Algorithms for Illumina Genotyping Microarrays

Somatic alterations in DNA copy number have been well studied in numerous malignancies, yet the role of germline DNA copy number variation in cancer is still emerging. Genotyping microarrays generate allele-specific signal intensities to determine genotype, but may also be used to infer DNA copy number using additional computational approaches. Numerous tools have been developed to analyze Illu...

متن کامل

Hardy-Weinberg equilibrium revisited for inferences on genotypes featuring allele and copy-number variations

Copy number variations represent a substantial source of genetic variation and are associated with a plethora of physiological and pathophysiological conditions. Joint copy number and allelic variations (CNAVs) are difficult to analyze and require new strategies to unravel the properties of genotype distributions. We developed a Bayesian hidden Markov model (HMM) approach that allows dissecting...

متن کامل

Bayesian non-parametric hidden Markov models with applications in genomics

We propose a flexible non-parametric specification of the emission distribution in hidden Markov models and we introduce a novel methodology for carrying out the computations. Whereas current approaches use a finite mixture model, we argue in favour of an infinite mixture model given by a mixture of Dirichlet processes.The computational framework is based on auxiliary variable representations o...

متن کامل

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

استفاده از مدل مارکوف پنهان در پیش‎بینی موارد جدید سل در استان همدان بر اساس اطلاعات موارد ثبت شده طی سال‎های 94-1384

Background and Objectives: Tuberculosis is a chronic bacterial disease and a major cause of morbidity and mortality. It is caused by a Mycobacterium tuberculosis. Awareness of the incidence and number of new cases of the disease is valuable information for revising the implemented programs and development indicators. time series and regression are commonly used models for prediction but these m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Biostatistics

دوره 14 3  شماره 

صفحات  -

تاریخ انتشار 2013